occupancy measure
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- (14 more...)
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
- Information Technology > Artificial Intelligence > Robots (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Research Report > New Finding (0.67)
- Overview (0.67)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Reviewer # 1 1 Q1: the claim that the algorithm really manages to align the latent distributions of real and simulated data
Q1: ...the claim that the algorithm really manages to align the latent distributions of real and simulated data... We will revise the inappropriate statements in the final version. Q2: In the model adaptation phase, are state-action pairs simply sampled randomly from their respective buffers? Do you have results for a single, monolithic model? Q4: Did you investigate the reasons for the slow learning in the 500 steps on InvertedPendulum compared to PETS? Q1: The experiments shown in Figure 2 do not outperform MBPO beyond the confidence bounds.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
Imitation with Neural Density Models
We propose a new framework for Imitation Learning (IL) via density estimation of the expert's occupancy measure followed by Maximum Occupancy Entropy Reinforcement Learning (RL) using the density as a reward. Our approach maximizes a non-adversarial model-free RL objective that provably lower bounds reverse Kullback-Leibler divergence between occupancy measures of the expert and imitator. We present a practical IL algorithm, Neural Density Imitation (NDI), which obtains state-of-the-art demonstration efficiency on benchmark control tasks.